Combine SEGY Files

 

Version 1.0

Geological Survey of Canada

Natural Resources Canada

March, 2009

 

 

Table of Contents

 

Introduction

Intellectual and Property Rights

Installing the Program

Launching the Program

Main Window : Loading files

Processing Window : Setup  Demultiplex Parameters

Main Window : Metadata Entry

Main Window : Operational Options

Main Window : Processing

Appendix A : Retrieving AGCDIG Data from 8 mm Exabyte Tape

 

 

Introduction

 

This application has been designed to load, demultiplex and combine digital SEGY files that are collected during marine field operations.

 

During these expeditions, digitizers are often set up to record in one-hour chunks or they are set up to record files of a predetermined file size, often around 100 MB. Consequently, each field day will generate over 20 files per seismic recorder, and typically 1 to 2 GB of data. This program will combine the multiple files from one day (or days) into one large SEGY file with a self-describing name.

 

 

 

Most marine geologists prefer to use an electrostatic hardcopy of the seismic data  and find it quite convenient to roll out a single record that typically contains data from multiple field days. They find it a challenge to process and display digital seismic data, as often the number of files and the size of files present operational difficulties. Consequently, most of the digital seismic data collected over the last decade by the GSC have not been interpreted or even verified from digital tape.

 

Current  seismic recorders (e.g. GSCDIG) used in GSC field operations generally record a single seismic channel in a  datafile and  multiple data files are recorded simultaneously per instrument. In previous seismic recorders (e.g., agcdig multiple channels were encoded in a single SEGY file including a NMEA navigation datastream.

 

Other systems create non-SEGY native data streams. The Knudsen Chirp recorder generates files in a proprietary  KEB format which  can be converted into segy  format using a program that can be obtained from the manufacturer.  Our new Klein sidescan record in a Klein SDF format ; at the moment, we are working on a converter to SEGY.

 

This program implements the “harvest” stage of a processing framework implemented at the GSC for storage and dissemination of sonar data.

 

 

 

In subsequent programs, these combined SEGY files will be converted into a JPEG 2000 files, which will offer substantial compression [more than 90%] and the means of easily viewing and interpreting these data with both off-the-shelf and custom image viewing software This program will reduce the number of seismic data files by producing very large, combined segy files and subsequent steps will make the handling of these very large data files not only possible but very easy.

 

Intellectual and Property Rights

 

This program is the sole property of the Government of Canada. All rights to modify and distribute this software are retained the Government of Canada. Contact the author, Bob Courtney, Natural Resources Canada,  1 Challenger Drive , Dartmouth Nova Scotia B2Y 4A2 , tel: 902-426-5062 to obtain a copy .

 

Installing the Program

 

The Combine SEGY application has been tested on Windows XP and probably will work on Vista as well. The only prerequisite for running this application is the presence of .NET Framework 2.0, which can be downloaded from Microsoft if it is not already reinstalled on your machine. I believe the application setup will point you in the right direction if  this component does not exist.

 

The Combine SEGY application is installed using standard Microsoft installation procedures, and can be installed from the NRN network from the address, \\s5-dar-data1\shared_software\Combine SGY\setup.exe  (outside of the GSCA local network, you may have to use \\192.55.224.44\shared_software\Combine SGY\setup.exe ). It will install the application to the C:\Program Files directory under a subdirectory called NRCAN. It will also install a link to the program through the Start->All Programs-> NRCan  path on the main taskbar.

 

A copy of the installation software may also be obtained from the author via e-mail contact to Bob Courtney.

 

Note:

 

 If you are installing a new version of this program, it is necessary to first delete the previous version using the Control Panel/Add and Remove programs utility. The program is installed under the name CombineSegy.

 

Launching the Program

 

The program can be launched by clicking on the Combine_Segy icon located under the NRCan folder found in the programs menu from the start button [see figure 1].

 

Figure 1 – Main application form.

 

 

Main Window : Loading files

 

Three buttons control the loading and deletion of files selected for processing [see figure 2].

 

Figure 2 – File loading and management buttons.

 

The Load Segy  Files button will bring up a file selection box and multiple files may be chosen using standard Windows shift click syntax. By default, only files with a  “.sgy” extension will be displayed in the file box, but all files can be displayed by using the pulldown option in the type of files selector box.

 

When loading files into the box, it is normally not critical to choose only SEGY files as each file is tested to see if it is in the proper format. There may be exceptions, of course, where the odd file may pass this test, so it is worthwhile to inspect the files that are placed in the list box. The choice of files is not restricted to one directory and multiple files from different directories can be selected. The order of the files does not matter, but choose only files of the same recording type.

 

For subsequent operations, it is important not have the same file selected more than once. If a file is listed more than once, or if non-segy file passes the segy test, it can be deleted by clicking on the item in the list box, and pressing the clear selected button. Multiple selections can be made with the Windows standard shift-click or control-click syntax.

 

The entire list can be deleted by clicking the clear the entire list button.


 

Processing Window : Setup  Demultiplex Parameters

 

For many of our newer data sets, each input file only contains one channel of seismic data and the files not need to be demultiplexed. However, in older datafiles and in industry multichannel data sets, it is necessary to choose the desired channels. To initiate this process, click the button on the main form labeled “Setup Demultiplex Parameters” which will expose the following form (see figure 3):

 

Figure 3 – Demultiplex Form.

 

Click the “scan selected file” button and the program will scan the first hundred traces in  the file highlighted in the main form list box and display the data cooresponding to the columns labeled in the demultiplex form. By toggling back and forth between the demultiplex form and the main window form, one can scan the contents of any of the files present in the main form list box.

 

Typically, industry-standard segy multichannel data will place the channel code information in the trace ID position in the trace header [COL 2] , whereas a nonstandard position was used to encode the channel ID in the AGCDIG format [col 3].

 

There is a user-defined field option to choose a variable position in the 240 byte SEGY trace header if channel information is located elsewhere in the header. In this case, one would specify the start position of the data in the trace header, counting from the start of the 240 byte trace header. One also needs to specify the length of the word in bytes, this program assumes that the data is in unsigned integer format. Note that this option has not been well tested, and if a user has problems using this option, please contact the author so it can be fixed.

 

 

To  extract a single channel from the file, click on the “demultiplex 1 channel” check box. Set the channel field to the column in the scanned file listing where the channel information is contained. For example, one would input column 2 for most industry-standard segy data. One would choose column 3 for AGCDIG  data.  Then choose the appropriate channel number in the channel number scroll box.

 

 

Sometimes we want to extract two channels and output a composite file containing data from two combined traces. To choose this option, click on the “demultiplex sidescan” check box. Choose as before the field to designate the column in which channel information is stored. Choose the appropriate codes for the port [upper] and the starboard [lower] channels.

 

Examples of this would include sidescan data where port and starboard data are saved as separate channels, or high-resolution Huntec data  wherein internal and external data are recorded. In the case of sidescan data, the port channel should be flipped as is usual in the display of this kind of information. In the case of Huntec data, the internal and external channels can be stacked on top of each other in a composite section.

 

 In the composite section, the trace will be double the trace length of the individual channels. The composite trace will have the port data first, followed by the starboard data.

 

The trace header will be an exact replica of the port channel trace header except that the number of samples will be doubled. Thus, the delay stored in the composite trace header will be that of the port data channel. It is assumed that the port and starboard channels have the same trace length and the same sampling rate.

 

 

Figure 4 – Sample demultiplex parameters for AGCDIG sidescan data.

 

When all demultiplex information has been entered into the form, click the “hide window” button so that form contents cannot be mistakenly changed. The window can be viewed again by clicking the “setup multiplexing parameters” button on the main form. The settings in the demultiplex form will have been retained.


 

Main Window : Metadata Entry

 

Figure 5 Expedition specific metadata.

 

The Combine Segy Files program will generate large composite segy files and  expedition specific metadata will be used to give self-describing names to these composite files. Although these fields are only used in the nomenclature of files, in subsequent processing these new file names will remain largely untouched.

 

Each composite file will have the following nomenclature:

 

expedition ID_datatype_instrument type_transducer type_start time_end time.sgy.

 

For example,

 

2000ANNE_S_PIERCE_SIDESCAN_SIMRAD_120khz_112_1629_to_112_1959.sgy

 

had been generated from sidescan data collected on the expedition, 2000ANNE_S_PIERCE, with a SIMRAD instrument with the 120 kHz transducer type option between Julian day. 112 time 1629  and Julian day 112  time 1959. This filename standard will prove very convenient for organizing and sorting these files in a directory system.

 

Since the most of entry fields in figure 5 correspond directly to enumerated (fixed form strings from a limited list) data contained in GSC's archive databases [ED and PAD] , is highly desirable to choose these variables based on the names that actually exist in our databases. Then these files may be directly related to ancillary data found in these databases by parsing their file names. The only field that is not in our databases, and therefore has a more free format, is the transducer type field, which I consider somewhat of an oversight in the design of our databases. I would ask the user to at least be consistent with the information typed into the transducer type field.


 Each entry name can be manually entered into the corresponding combo box, but the list of acceptable values for these fields can be accessed by clicking on the down arrow in each box. Some of the lists are long so one can quickly access a desired name by typing the first few characters of the desired field into the text part of the combo box. At this time, the list box should scroll showing entries that match the characters typed into the text box.

 

 The entries in the pull-down list were extracted from our databases in 2007 so newer expedition IDs, for example, will not be found. It is possible to directly derive the contents of these pulldown lists from our online databases as the program is launched, but that would require the installation of Oracle drivers on each of the target PCs and, more importantly, the use of the program at sea would be hindered.

 

 

Main Window : Operational Options

 

Figure 6 - Operational options

 

In theory, the maximum size of the files generated through this process should have no real practical size limit. However, these routines have not been tested with data files greater than 4 GB in size, and typically, we generally collect about 2 GB per channel per 24-hour period. The data entered in the operational options section controls the maximum size and duration of the composite files, and in addition. It controls whether or not, the file contains whitespace when there exists time gaps between successive input files.

 

I would recommend setting the maximum combined file size variable to 2000 [2 GB] and the maximum combined time duration to 24 hours. By default, null traces are generated when small gaps exist between successive files, but if the time gap exceeds 10 minutes the composite file will be closed and a new one started. The operator can choose the maximum time gap through the scroll box.

 

The final checkbox instructs the program to demultiplex the data only, and not attempt to combine successive files. This option must be used if the time fields in the segy header are empty as there is no way of sequencing the files [at least by time.].

 

An example, that I've seen, would be industry multichannel data where only the shot number or trace number has been recorded, and one wishes to pull a signal from a selected channel or pair of channels. In this case, the metadata tags in figure 5 are ignored and the output file has the same name as the input file, but the ending of the file is augmented by the chosen channel number.

 

 

Main Window : Processing

 

Figure 7 - Main window configured and ready to process.

 

At this point, you are ready to process the files. Click on the process files button, and browse the directory system for a place to put your composite files. The program will first scan through all the input list box, and reorder the list based on increasing time. During the construction of the composite files, a temporary file called SegyOut.sgy will be present but it will be renamed with the appropriate nomenclature. As discussed in the metadata section.

 

You're done.

 

Appendix A

 

Retrieving AGCDIG Data from 8 mm Exabyte Tape

 

From about 1992 onwards to the late 90s, marine geophysical seismic data were digitized by  software developed by the Geological Survey of Canada Atlantic called AGCDIG. This digitizer could record up four simultaneous channels with the same delay trigger. Data was recorded in a modified SEGY format on 8mm Exabyte tape drives by in both 8200 and 8500 formats.

 

The data were multiplexed in an interleaved fashion on tape and the channel ID was also stored in a nonstandard position in the trace header. In the latter versions of AGCDIG, an additional channel was recorded that encoded a serial NMEA stream from the GPS serial feed. This additional channel usually had a trace length of 1024, which often differed from the trace length recorded for each of the other channels. Consequently, any program that decodes or demultiplexes these segy files cannot assume constant trace length nor assume standard locations for variables in the trace header. C language headers describing the modified segy format for these tapes are available from the author.

 

The data was recorded on Exabyte tape at a small block size less than 32K. these files can be easily dumped from tape onto hard drive on a Linux system using the following procedure:

 

[1] Insert the tape into the target Exabyte drive. Tapes may be read in Exabyte 8500 through 8900 [Mammoth] tape drives. If using a Mammoth drive, the drive must be cleaned before high density Mammoth tapes can be read. Set the tape drive to use a variable block length:

 

                        mt –t /dev/nst0 setblk 0

 

 

[2] Pretension the tape by issuing the following commands (here we assume that the tape device name in a no rewind mode is /dev/nst0 ) :

 

                        mt –t /dev/nst0 eod

                        mt –t /dev/nst0 rewind

 

[3] Skip forward to the desired file. Often the first file is empty so the first significant data is found after the second file mark:

 

                        mt –t /dev/nsto fsf 1

 

[4] use the Linux command, dd, to dump the file onto the current working directory:

                        dd if=/dev/nst0 of=NewFileName ibs=32k

 

The above procedure can be incorporated into a linux based c-shell script to recursively dump files from the tape onto a directory:

 

#!/bin/csh

 

mt -t /dev/nst0 rewind

            mt -t /dev/nst0 setblk 0

mt -t /dev/nst0 fsf 1

 

setenv count 0

while ( $count < 100)

setenv count `expr $count + 1`

setenv f file${count}

echo $f

dd if=/dev/nst0 ibs=32k of=$f

end

 

 

The above procedure will generate 100 files [modify the program if  there more than 100 files on the tape], many of them will be empty of course. But don’t worry, the Combine Segy program will ignore any files that are empty or not in a segy format.

 

Ensure that data from successive tapes are stored in separate directories.